Skip to content

[core] Fix silent data corruption due to overflow in FieldProductAgg#7906

Open
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/fix-product-overflow
Open

[core] Fix silent data corruption due to overflow in FieldProductAgg#7906
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/fix-product-overflow

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan commented May 19, 2026

Purpose

  • FieldProductAgg multiplies integers without overflow detection.
  • Accumulator silently wraps corrupting downstream product aggregations.
  • Throw on overflow using Math multiplyExact and range checks.
  • Previously the data corruption was silently ignored. However with this change, an overflow throws a hard exception. Downstream consumers should be aware.

Tests

  • UT

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @JingsongLi thanks! :)

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @leaves12138 @JingsongLi can you pls review thanks!

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of FieldProductAgg overflow detection

The intent is good — silent integer overflow during aggregation is indeed a correctness bug that corrupts data. The agg() fix for INTEGER/BIGINT using Math.multiplyExact is clean. The toExactByte/toExactShort helpers are correct since Java promotes byte * byte and short * short to int before the multiplication, so the int result can be range-checked without itself overflowing.

Issues

1. retract() method has the same overflow bug but is not fixed

paimon-core/src/main/java/org/apache/paimon/mergetree/compact/aggregate/FieldProductAgg.java

The retract method performs division, which can also overflow silently in the same cases:

  • (byte) ((byte) Byte.MIN_VALUE / (byte) -1) → result is 128, which wraps to -128 when cast to byte
  • (short) ((short) Short.MIN_VALUE / (short) -1) → result is 32768, wraps to -32768
  • (int) Integer.MIN_VALUE / (int) -1 → wraps to Integer.MIN_VALUE in Java
  • (long) Long.MIN_VALUE / (long) -1 → wraps to Long.MIN_VALUE in Java

If the goal of this PR is to eliminate silent data corruption from overflow, the retract() path should get the same treatment for consistency. At minimum, the TINYINT and SMALLINT cases need the same range-check pattern since the int promotion makes the overflow detectable. For INTEGER and BIGINT, you would need explicit checks (if (accumulator == Integer.MIN_VALUE && inputField == -1) etc.) since Java's / operator does not throw.

2. Exception message could include operand values for debuggability

When a compaction task fails with ArithmeticException("byte overflow"), the operator has no idea which values caused it. Consider including the operands:

throw new ArithmeticException(
    String.format("byte overflow: %d * %d = %d", (byte) accumulator, (byte) inputField, value));

This is especially important because this exception will surface during compaction, potentially long after the data was written, making reproduction difficult.

3. Consider the failure mode during compaction

An unchecked ArithmeticException thrown during compaction will fail the compaction task. Depending on the retry/error-handling policy, this could block compaction indefinitely for a partition that has overflowing values. This is arguably better than silent corruption, but it would be worth documenting this behavioral change (e.g., in the PR description or release notes) so users understand that previously-silent overflows will now cause task failures.

Minor notes

  • The test cases are well chosen (boundary values, both positive and negative overflow). Good coverage.
  • FLOAT/DOUBLE overflow to infinity is not addressed, which is fine — that is standard IEEE 754 semantics and expected for floating-point types.
  • FieldSumAgg has the same class of overflow bugs, but that is out of scope for this PR.

Overall this is a useful correctness fix. The main request is to apply the same protection to retract() for completeness, since a half-fix leaves an inconsistency where agg() detects overflow but retract() on the same type silently corrupts.

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-product-overflow branch from 53057d1 to 0885945 Compare May 23, 2026 15:00
@ArnavBalyan
Copy link
Copy Markdown
Member Author

ArnavBalyan commented May 23, 2026

Thanks for the review! Have addressed all comments:

  1. retract() method has the same overflow bug but is not fixed

    • Covered all cases, division and multiplication are now guarded and will not silently corrupt data.
  2. Exception message could include operand values for debuggability

    • Made the exception more clear and descriptive.
  3. Consider the failure mode during compaction

    • Added more documentation to this PR for posterity.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of FieldProductAgg overflow detection

Overall: The PR addresses a real correctness issue — integer multiplication/division overflow silently wrapping and corrupting aggregation results. The approach is sound and the fix is complete across both agg() and retract() paths.

What's good

  1. Correctness: The overflow detection logic is correct for all four integer types. byte/short use int-promoted arithmetic with range checks (safe because int can hold the full product/quotient range). int/long use Math.multiplyExact for multiply and explicit MIN_VALUE / -1 guards for divide.
  2. Error messages: Include operand values, which is critical for debugging compaction failures.
  3. Test coverage: Tests cover multiply overflow on all 4 integer types + retract (division) overflow on all 4 types, with both positive and negative overflow cases.
  4. Float/Double left alone: Correct — IEEE 754 overflow to infinity is expected behavior.

Minor suggestions

1. multiplyExactInt/multiplyExactLong — catch-rethrow pattern loses the original exception cause

} catch (ArithmeticException e) {
    throw new ArithmeticException(String.format("int overflow: %d * %d", a, b));
}

The new exception doesn't chain e as a cause. ArithmeticException lacks a cause constructor, but initCause() works:

ArithmeticException ex = new ArithmeticException(String.format("int overflow: %d * %d", a, b));
ex.initCause(e);
throw ex;

2. divideExactByte/divideExactShort — consider aligning with int/long pattern for consistency

The only case where integer division can overflow is MIN_VALUE / -1. The general value > MAX || value < MIN range check works correctly, but you could match the explicit MIN_VALUE && -1 check pattern used by divideExactInt/divideExactLong for consistency across all four types.

3. Behavioral change during compaction

An ArithmeticException during compaction will crash the task. This is better than silent corruption, but for existing tables that already contain overflowing data, upgrading Paimon could cause compaction to fail on those partitions with no easy recovery path. Worth noting in release notes.

4. FieldSumAgg has the same class of bug

FieldSumAgg also does unchecked (byte) accumulator + (byte) inputField etc. with the same silent overflow risk. Out of scope for this PR, but worth opening a follow-up issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants